6 research outputs found
A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector Multiplication
The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications. Parallelizing SymmSpMV on today's multicore platforms with up to 100 cores is difficult due to the need to manage conflicting updates on the result vector. Coloring approaches can be used to solve this problem without data duplication, but existing coloring algorithms do not take load balancing and deep memory hierarchies into account, hampering scalability and full-chip performance. In this work, we propose the recursive algebraic coloring engine (RACE), a novel coloring algorithm and open-source library implementation, which eliminates the shortcomings of previous coloring methods in terms of hardware efficiency and parallelization overhead. We describe the level construction, distance-k coloring, and load balancing steps in RACE, use it to parallelize SymmSpMV, and compare its performance on 31 sparse matrices with other state-of-the-art coloring techniques and Intel MKL on two modern multicore processors. RACE outperforms all other approaches substantially and behaves in accordance with the Roofline model. Outliers are discussed and analyzed in detail. While we focus on SymmSpMV in this paper, our algorithm and software is applicable to any sparse matrix operation with data dependencies that can be resolved by distance-k coloring
Implementation and Performance Engineering of the Kaczmarz Method for Parallel Systems
The Kaczmarz method is a simple and robust iterative solver for linear systems of equations. It is used in
different fields of science and engineering ranging from medical imaging to solving convection dominated
flows, Helmholtz equations and eigenvalue problems. In this thesis we investigate hardware-efficiency and
scalable shared memory parallelization strategies for the Kaczmarz method when used as a solver for sparse
linear systems. The inherent data dependencies of this method hinder fine-grained parallelism like SIMD or
multi-threading to be used efficiently. However, there exist techniques like multicoloring which can enable
this level of parallelism. A critical analysis of the multicoloring approach both in terms of performance and
qualitative behavior reveals its deficiencies on modern compute platforms. Starting with existing ideas, this
thesis proposes a novel "block multicoloring" method, which leverages structural features of (partly) bandor
hull-structured matrices. A thorough node-level performance analysis demonstrates that this approach
outperforms traditional multicoloring significantly (up to 3x on a single compute node) for a selection of
relevant application matrices and never falls behind it even for malicious cases. Finally, our Kaczmarz
implementation combined with block multicoloring is used as a linear solver in the FEAST method, to
compute inner eigenvalues of large sparse matrices. These first results demonstrate the applicability of the
presented approach and indicate its superiority for large scale computations as compared to direct solvers
which are state-of-the art for FEAST method
A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector Multiplication
The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications. Parallelizing SymmSpMV on today's multicore platforms with up to 100 cores is difficult due to the need to manage conflicting updates on the result vector. Coloring approaches can be used to solve this problem without data duplication, but existing coloring algorithms do not take load balancing and deep memory hierarchies into account, hampering scalability and full-chip performance. In this work, we propose the recursive algebraic coloring engine (RACE), a novel coloring algorithm and open-source library implementation, which eliminates the shortcomings of previous coloring methods in terms of hardware efficiency and parallelization overhead. We describe the level construction, distance-k coloring, and load balancing steps in RACE, use it to parallelize SymmSpMV, and compare its performance on 31 sparse matrices with other state-of-the-art coloring techniques and Intel MKL on two modern multicore processors. RACE outperforms all other approaches substantially and behaves in accordance with the Roofline model. Outliers are discussed and analyzed in detail. While we focus on SymmSpMV in this paper, our algorithm and software is applicable to any sparse matrix operation with data dependencies that can be resolved by distance-k coloring